A significant portion of the checking was done in the Excel file 'manual verification.'

In essence, I searched the state's site (https://apps.state.or.us/cf2/spd/facility_complaints/) using the same criteria as in the scraper, then copy-pasted the resulting list of totals by facility into a spreadsheet. I summed them up and compared them to what the scraper got me.

There were some differences between the two.

  1. The totals were off by 5. I manually checked. The totals didn't correspond to the actual numbers on the facility pages.
  2. The number of complaints per facility type did not match. I don't know why this is, but I don't see it as a problem because the totals are accurate, and we don't need to know the right facility type from the scraped data.
  3. Four facilities didn't join on 'name.' Checked each one. Each had the right number of complaints scraped.

In [5]:
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))



In [6]:
scraped_comp = pd.read_csv('../data/scraped/scraped_complaints_3_25.csv')

In [7]:
scraped_comp['abuse_number'] = scraped_comp['abuse_number'].apply(lambda x: x.upper())

In [8]:
manual = pd.read_excel('/Users/fzarkhin/OneDrive - Advance Central Services, Inc/fproj/github/database-story/scraper/manual verification.xlsx', sheetname='All manual')

In [9]:
manual = manual.groupby('name').sum().reset_index()

In [10]:
manual['name']= manual['name'].apply(lambda x: x.strip())
scraped_comp['fac_name']= scraped_comp['fac_name'].apply(lambda x: x.strip())

In [11]:
df = scraped_comp.groupby('fac_name').count().reset_index()[['fac_name','abuse_number']]

In [12]:
merge1 = manual.merge(df, how = 'left', left_on = 'name', right_on='fac_name')

Five facilities did not correspond. Manual checks shows inaccurate online data.


In [13]:
merge1[merge1['count']!=merge1['abuse_number']].sort_values('abuse_number')#.sum()


Out[13]:
name count fac_name abuse_number
434 KING CITY REHABILITATION & LIVING CENTER 7 KING CITY REHABILITATION & LIVING CENTER 6.0
295 ENCORE SENIOR VILLAGE AT PORTLAND 9 ENCORE SENIOR VILLAGE AT PORTLAND 8.0
310 FAIR VIEW TRANSITIONAL HEALTH CENTER 10 FAIR VIEW TRANSITIONAL HEALTH CENTER 9.0
769 ST. ELIZABETH HEALTH SERVICES 10 ST. ELIZABETH HEALTH SERVICES 9.0
207 COLUMBIA CARE CENTER 15 COLUMBIA CARE CENTER 14.0
74 AVAMERE AT SANDY ASSISTED LIVING FACILITY 8 NaN NaN
325 FLAGSTONE RETIREMENT & ASSISTED LIVING 4 NaN NaN
527 MILL CREEK POINT ASSISTED LIVING RESIDENCE 1 NaN NaN
640 PRINCETON VILLAGE ASSISTED LIVING RESIDENCE 11 NaN NaN

In [14]:
manual[manual['name']=='AVAMERE AT SANDY']


Out[14]:
name count
73 AVAMERE AT SANDY 8

In [15]:
scraped_comp[scraped_comp['abuse_number']=='BH116622B']


Out[15]:
abuse_number city_name fac_name fac_type inv_comp_date online_incident_date
4214 BH116622B CLACKAMAS PRINCETON VILLAGE ASSISTED LIVING RESIDENCE ALF 11/22/2011 03/21/2011

In [16]:
scraped_comp[scraped_comp['fac_name'].str.contains('FLAGSTONE RETIREME')]


Out[16]:
abuse_number city_name fac_name fac_type inv_comp_date online_incident_date
3526 DL150178 THE DALLES FLAGSTONE RETIREMENT & ASSISTED LIVING ALF 04/15/2015 02/05/2015
3527 DL133268 THE DALLES FLAGSTONE RETIREMENT & ASSISTED LIVING ALF 11/06/2013 05/18/2013
3528 DL121008 THE DALLES FLAGSTONE RETIREMENT & ASSISTED LIVING ALF 11/15/2012 09/05/2012
3529 DL061357 THE DALLES FLAGSTONE RETIREMENT & ASSISTED LIVING ALF 05/03/2006 03/19/2006

In [17]:
merge2 = manual.merge(df, how = 'right', left_on = 'name', right_on='fac_name')

In [18]:
merge2[merge2['count']!=merge2['abuse_number']].sort_values('count')#.sum()


Out[18]:
name count fac_name abuse_number
432 KING CITY REHABILITATION & LIVING CENTER 7.0 KING CITY REHABILITATION & LIVING CENTER 6
294 ENCORE SENIOR VILLAGE AT PORTLAND 9.0 ENCORE SENIOR VILLAGE AT PORTLAND 8
309 FAIR VIEW TRANSITIONAL HEALTH CENTER 10.0 FAIR VIEW TRANSITIONAL HEALTH CENTER 9
765 ST. ELIZABETH HEALTH SERVICES 10.0 ST. ELIZABETH HEALTH SERVICES 9
206 COLUMBIA CARE CENTER 15.0 COLUMBIA CARE CENTER 14
843 NaN NaN AVAMERE AT SANDY ASSISTED LIVING FACILITY 8
844 NaN NaN FLAGSTONE RETIREMENT & ASSISTED LIVING 4
845 NaN NaN MILL CREEK POINT ASSISTED LIVING RESIDENCE 1
846 NaN NaN PRINCETON VILLAGE ASSISTED LIVING RESIDENCE 11

Verdict: Scrape is good.